Probabilistic Planning via Linear Value-approximation of First-order MDPs

نویسندگان

  • Scott Sanner
  • Craig Boutilier
چکیده

We describe a probabilistic planning approach that translates a PPDDL planning problem description to a first-order MDP (FOMDP) and uses approximate solution techniques for FOMDPs to derive a value function and corresponding policy. Our FOMDP solution techniques represent the value function linearly w.r.t. a set of first-order basis functions and compute suitable weights using lifted, first-order extensions of approximate linear programming (FOALP) and approximate policy iteration (FOAPI) for MDPs. We additionally describe techniques for automatic basis function generation and decomposition of universal rewards that are crucial to achieve autonomous and tractable FOMDP solutions for many planning domains. From PPDDL to First-order MDPs It is straightforward to translate a PPDDL [12] planning domain into the situation calculus representation used for firstorder MDPs (FOMDPs); the primary part of this translation requires the conversion of PPDDL action schemata to effect axioms in the situation calculus, which are then compiled into successor-state axioms [8] used in the FOMDP description. In the following algorithm description, we will assume that we are given a FOMDP specification and we will describe techniques for approximating its value function linearly w.r.t. a set of first-order basis functions. From this value function it is straightforward to derive a first-order policy representation that can be used for action selection in the original PPDDL planning domain. Linear Value Approximation for FOMDPs The following explanation assumes the reader is familiar with the FOMDP formalism and operators used in Boutilier, Reiter and Price [2] and extended by Sanner and Boutilier [9]. In the following text, we will refer to function symbols Ai(~x) that correspond to parameterized actions in the FOMDP; for every action and fluent, we expect that a successor state axiom has been defined. The reader should be familiar with the notation and use of the rCase, vCase, and pCase case statements for representing the respective FOMDP reward, value, and transition functions. The reader should also be familiar with the case operators ⊕, , ∪, and Regr(·) [2] as well as FODTR(·), B(·), and B(·) [9]. Value Function Representation Following [9], we represent a value function as a weighted sum of k first-order basis functions in case statement format, denoted bCasej(s), each containing a small number of formulae that provide a first-order abstraction of state space: vCase(s) = ⊕ki=1 wi · bCasei(s) (1) Using this format, we can often achieve a reasonable approximation of the exact value function by exploiting the additive structure inherent in many real-world problems (e.g., additive reward functions or problems with independent subgoals). Unlike exact solution methods where value functions can grow exponentially in size during the solution process and must be logically simplified [2], here we maintain the value function in a compact form that requires no simplification, just discovery of good weights. We can easily apply the FOMDP backup operator B [9] to this representation and obtain some simplification as a result of the structure in Eq. 1. Exploiting the properties of the Regr and ⊕ operators, we find that the backup B of a linear combination of basis functions is simply the linear combination of the first-order decision-theoretic regression (FODTR) of each basis function [9]: B (⊕i wibCasei(s)) = (2) rCase(s, a) ⊕ (⊕i wiFODTR(bCasei(s), A(~x))) A corresponding definition of B follows directly [9]. It is important to note that during the application of these operators, we never explicitly ground states or actions, in effect achieving both state and action space abstraction. First-order Approximate Linear Programming First-order approximate linear programming (FOALP) was introduced by Sanner and Boutilier [9]. Here we present a linear program (LP) with first-order constraints that generalizes the solution from MDPs to FOMDPs: Variables: wi ; ∀i ≤ k

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Practical Linear Value-approximation Techniques for First-order MDPs

Recent work on approximate linear programming (ALP) techniques for first-order Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of first-order basis functions and uses linear programming techniques to determine suitable weights. This approach offers the advantage that it does not require simplification of the first-order value function, and allows one to so...

متن کامل

Structured Possibilistic Planning Using Decision Diagrams

Qualitative Possibilistic Mixed-Observable MDPs (πMOMDPs), generalizing π-MDPs and π-POMDPs, are well-suited models to planning under uncertainty with mixed-observability when transition, observation and reward functions are not precisely known and can be qualitatively described. Functions defining the model as well as intermediate calculations are valued in a finite possibilistic scale L, whic...

متن کامل

Probabilistic Planning with Risk-Sensitive Criterion

Probabilistic planning models and, in particular, Markov Decision Processes (MDPs), Partially Observable Markov Decision Processes (POMDPs) and Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) have been extensively used by AI and Decision Theoretic communities for planning under uncertainty. Typically, the solvers for probabilistic planning models find policies that min...

متن کامل

Practical solution techniques for first-order MDPs

Many traditional solution approaches to relationally specified decision-theoretic planning problems (e.g., those stated in the probabilistic planning domain description language, or PPDDL) ground the specification with respect to a specific instantiation of domain objects and apply a solution approach directly to the resulting ground Markov decision process (MDP). Unfortunately, the space and t...

متن کامل

Probabilistic Reachability Analysis for Structured Markov Decision Processes

We present a stochastic planner based on Markov Decision Processes (MDPs) that participates to the probablistic planning track of the 2004 International Planning Competition. The planner transforms the PDDL problems into factored MDPs that are then solved with a structured policy iteration algorithm. A probabilistic reachability analysis is performed, approximating the MDP solution over the rea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005